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Abstract 

The  sexual  stages  of  malarial  parasites  are  essential  for  the  mosquito  transmission  of  the  disease  and  therefore  are  the  focus  of  transmission¬ 
blocking  drug  and  vaccine  development.  In  order  to  better  understand  genes  important  to  the  sexual  development  process,  the  transcriptomes  of 
high-purity  stage  I-V  Plasmodium  falciparum  gametocytes  were  comprehensively  profiled  using  a  full-genome  high-density  oligonucleotide 
microarray.  The  interpretation  of  this  transcriptional  data  was  aided  by  applying  a  novel  knowledge-based  data-mining  algorithm  termed 
ontology-based  pattern  identification  (OPI)  using  current  information  regarding  known  sexual  stage  genes  as  a  guide.  This  analysis  resulted 
in  the  identification  of  a  sexual  development  cluster  containing  246  genes,  of  which  ~T5%  were  hypothetical,  exhibiting  highly-correlated, 
gametocyte-specific  expression  patterns.  Inspection  of  the  upstream  promoter  regions  of  these  246  genes  revealed  putative  cA-regulatory  ele¬ 
ments  for  sexual  development  transcriptional  control  mechanisms.  Furthermore,  OPI  analysis  was  extended  using  current  annotations  provided 
by  the  Gene  Ontology  Consortium  to  identify  380  statistically  significant  clusters  containing  genes  with  expression  patterns  characteristic  of 
various  biological  processes,  cellular  components,  and  molecular  functions.  Collectively,  these  results,  available  as  part  of  a  web-accessible 
OPI  database  (http://carrier.gnf.org/publications/Gametocyte),  shed  light  on  the  components  of  molecular  mechanisms  underlying  parasite 
sexual  development  and  other  areas  of  malarial  parasite  biology. 
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1.  Introduction 

Malaria  continues  to  be  a  devastating  infectious  disease, 
responsible  for  approximately  500  million  clinical  episodes 
and  millions  of  deaths  each  year  worldwide  [1].  Development 
of  drugs  and  vaccines  to  combat  the  disease  traditionally  has 
focused  heavily  on  the  intraerythrocytic  stages  of  the  malar¬ 
ial  parasite  life  cycle,  as  these  stages  are  responsible  for  the 
clinical  symptoms  associated  with  the  illness.  However,  in 
recent  decades,  it  has  become  apparent  that  any  successful 
strategy  for  controlling  malaria  will  most  likely  require  a 
multifaceted  approach  that  also  includes  drugs  and  vaccines 
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against  other  stages  of  the  complex  parasite  life  cycle.  Con¬ 
sequently,  the  sexual  stages  of  the  parasite  essential  for  the 
mosquito  transmission  of  the  disease  are  considered  attrac¬ 
tive  targets  for  the  development  of  new  transmission-blocking 
drugs  and  vaccines  that  aim  to  prevent  the  spread  of  malaria 
in  human  populations  [2,3].  Such  strategies  are  thought  to 
be  especially  promising,  as  it  has  been  hypothesized  that  the 
parasite  maybe  more  vulnerable  to  vaccine  and  drug  interven¬ 
tion  during  the  sexual  part  of  its  life  cycle  due  to  its  passage 
through  a  numerical  bottleneck  and  the  limited  exposure  to 
the  human  immune  system  it  receives  during  these  stages  [3]. 

Of  the  four  Plasmodium  parasite  species  responsible  for 
malaria  in  humans,  the  progression  of  sexual  development  is 
best  understood  morphologically  for  the  most  lethal  species, 
Plasmodium  falciparum  [4,5].  The  switch  from  an  asexual 
to  sexual  mode  of  replication  begins  in  the  haploid  intraery- 
throcytic  stages,  where  a  sub-population  of  asexual  parasites 
begin  to  develop  into  male  and  female  gametocytes.  This  pro¬ 
cess  of  gametocyte  development  continues  in  the  human  host 
over  a  period  of  approximately  10  days,  encompassing  five 
morphologically  defined  gametocyte  stages  (stage  I-V)  and 
ending  with  the  formation  of  mature  male  and  female  gameto¬ 
cytes.  When  mature  gametocytes  are  taken  up  by  a  mosquito 
as  part  of  the  bloodmeal  the  process  of  sexual  development 
continues  in  the  mosquito  midgut,  where  male  gametocytes 
undergo  exflagellation  to  form  highly  motile  male  gametes 
and  female  gametocytes  enlarge  and  emerge  from  red  blood 
cells  to  form  female  gametes.  The  subsequent  fusion  of  a 
single  male  and  female  gamete  results  in  fertilization  and 
the  formation  of  a  diploid  zygote  that  then  differentiates  into 
a  motile  ookinete.  It  is  this  ookinete  stage  that  transverses 
the  mosquito  midgut  wall  to  form  an  oocyst,  where  asexual 
sporogonic  development  is  once  again  initiated. 

Despite  this  detailed  morphological  description,  compre¬ 
hensive  understandings  of  most  of  the  fundamental  biological 
mechanisms  driving  the  parasite  sexual  development  process 
remain  elusive.  For  example,  we  still  do  not  fully  under¬ 
stand  how  gene  and  protein  expression  is  regulated  during 
sexual  development  and  what  specific  metabolic  differences 
exist  between  the  asexual  and  sexual  stages.  As  a  result,  cur¬ 
rent  transmission-blocking  vaccine  development  has  focused 
on  only  a  handful  of  known  sexual-specific  proteins,  and 
chemotherapeutics  that  selectively  kill  sexual  stage  parasites 
have  yet  to  be  discovered  [2,3]. 

The  availability  of  the  full  genome  sequence  for  P. 
falciparum  since  2002  has  allowed  for  the  investigation  of 
gene  functions  in  the  malarial  parasite  using  high-throughput 
genomic  and  proteomic  approaches  that  circumvent  some 
obstacles  associated  with  the  application  of  more  traditional 
genetic  and  biochemical  methods  to  malaria  research  [6-8]. 
Previously,  we  conducted  a  genome-wide  transcriptional 
analysis  of  various  stages  of  the  P.  falciparum  life  cycle 
using  a  high-density  oligonucleotide  microarray  coupled 
with  a  k-means  clustering  approach  to  identify  15  groups 
of  genes  sharing  mRNA  expression  patterns  characteristic 
of  various  biological  processes  such  as  antigenic  variation 


and  cell  invasion  [9].  In  an  analogous  study,  Bozdech  and 
co-workers  utilized  a  spotted  oligonucleotide  array  and 
Fourier  transformation  analysis  to  demonstrate  that  ~80% 
of  the  P  falciparum  genes  expressed  during  the  asexual 
intraerythrocytic  cell  cycle  exhibit  a  single-peak  periodic 
pattern  in  their  transcript  levels,  with  genes  involved  in 
common  biological  processes  also  sharing  similar  phase 
[10].  Together,  these  studies  provided  many  novel  insights 
into  potential  functions  for  the  approximately  3000  hypo¬ 
thetical  genes  in  the  P.  falciparum  genome.  However,  due 
to  the  inclusion  of  only  a  single  late-stage  gametocyte  time 
point  in  the  former  work,  insights  provided  by  these  studies 
regarding  sexual  stage  gene  function  were  limited. 

To  better  understand  genes  involved  in  P.  falciparum  sex¬ 
ual  development  using  an  expression  microarray  approach, 
we  have  obtained  new  transcriptional  data  on  detailed  time 
courses  of  gametocyte  development,  including  high-purity 
early-stage  gametocytes.  To  aid  in  the  interpretation  of  this 
new  microarray  data  within  the  context  of  our  previous 
asexual  stage  data  [9],  we  applied  a  recently  developed 
knowledge-based  clustering  algorithm  called  ontology-based 
pattern  identification  (OPI)  [11].  OPI  utilizes  classifications 
of  gene  function  in  the  form  of  systematic  gene  annotations 
to  generate  gene  clusters  with  greater  specificity  and  statis¬ 
tical  confidence  than  is  possible  using  more  routinely-used 
clustering  methods  such  as  the  k-means  clustering  approach 
employed  in  the  past  [7,9,10].  For  example,  hierarchical  or 
k-means  methods  require  the  user  to  arbitrarily  select  a  corre¬ 
lation  coefficient,  a  fold-change,  or  a  number  of  k-clusters  to 
determine  cluster  sizes.  In  contrast,  OPI  empirically  selects 
the  best  parameters  to  give  the  highest  concentration  of  genes 
belonging  to  a  particular  classification  in  the  smallest  cluster 
size.  Using  manually-assigned  gene  annotations  for  sexual 
development  to  guide  OPI,  a  cluster  of  246  genes  possess¬ 
ing  highly  gametocyte-specific  mRNA  expression  patterns 
was  identified.  Furthermore,  inspections  of  promoter  regions 
upstream  of  these  genes  revealed  conserved  sequences  that 
potentially  play  roles  as  cA-regulatory  elements  in  sexual 
development  transcriptional  control  mechanisms.  Beyond 
sexual  development,  OPI  was  also  extended  to  identify  genes 
most  likely  to  be  involved  in  380  biological  processes,  cellu¬ 
lar  components,  and  molecular  functions  using  information 
provided  by  the  Gene  Ontology  (GO)  Consortium  [12].  Col¬ 
lectively,  these  381  clusters,  accessible  as  part  of  an  OPI 
database  at  http://carrier.gnf.org/publications/Gametocyte, 
provide  new  insights  into  P  falciparum  gene  function  with 
direct  implications  for  rational  drug  and  vaccine  development 
against  malaria. 

2.  Materials  and  methods 

2.1.  Parasite  cultivation 

Early  passages  of  P.  falciparum  clone  3D7  and  isolate 
NF54  were  cultured  with  human  A+  or  0+  erythrocytes, 
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respectively,  as  previously  described  [13].  Prior  to  induction 
of  gametocyte  development,  the  cultures  underwent  two 
rounds  of  synchronization  with  5%  D-sorbitol  [14].  Synchro¬ 
nized  sexual  development  was  induced  by  a  sudden  increase 
to  the  hematocrit  of  a  fast  growing  ring  stage  culture  in  the 
presence  of  partially-spent  medium.  Fresh  erythrocytes  were 
added  to  the  media  at  the  schizont  stage  on  the  following 
day  (Day-1).  A'-Acetyl  glucosamine  (GlcNAc,  50  mM)  was 
added  to  the  media  exchanged  from  Day  0  to  eliminate 
residual  asexual  stages  [5].  Both  sets  of  cultures  were 
maintained  for  14  days  to  obtain  stage  I  through  stage  V 
gametocytes,  with  NF54  GlcNAc  treatment  being  withdrawn 
on  Day  10.  NF54  stage  V  gametocytes  were  subsequently 
fed  to  Anopheles  stepliensi  mosquitoes  on  Day  15  to  confirm 
infectivity.  Parasite  stages  for  all  cultures  were  monitored 
using  Giemsa-stained  blood  smears. 

In  order  to  obtain  enriched  early  3D7  gametocyte  stages  I 
and  II,  contaminating  asexual  stages  were  removed  using  CS 
magnetic  affinity  columns  on  Day  2  (Miltenyi  Biotech,  Ger¬ 
many).  A  mixed  culture  of  erythrocytes,  asexual  ring  stages, 
and  early  gametocytes  were  passed  through  a  CS  column. 
Stage  I  and  II  gametocytes  containing  greater  amounts  of 
hemozoin  were  retained  on  the  column  while  the  younger 
rings  and  red  blood  cells  passed  through.  The  column  was 
separated  from  the  magnet  and  the  retained  cells  were  eluted 
as  an  enriched,  positively  selected  early-stage  gametocyte 
cell  fraction  with  a  removal  of  ~80%  asexual  parasites.  The 
enriched  gametocytes  were  returned  to  culture  at  2%  par¬ 
asitemia  in  the  presence  of  GlcNAc.  Giemsa-stained  blood 
smear  cell  counts  of  enriched  gametocytes  were  confirmed 
using  immunofluorescence  cell  counts  with  a  gametocyte 
specific  antibody.  Parasite  preparations  were  washed  three 
times  in  phosphate-buffered  saline  (PBS;  pH  7.4)  containing 
0.1%  bovine  serum  albumin  (BSA)  and  spotted  on  to  each  of 
12  wells  on  Teflon-coated  slides.  Dried  slides  were  fixed  in 
acetone  on  ice  for  30  min  and  stored  at  —20  °C.  After  removal 
from  —  20  °C,  slides  were  washed  in  PBS,  and  blocked  in 
PBS-0.1%  BSA  for  30 min  at  22  °C  in  the  dark.  Anti-Pfsl6 
primary  antibody  (mouse  monoclonal  antibody  93A3A2) 
diluted  1:250  was  applied  in  PBS-0.1%  BSA  for  2h,  the 
slides  were  then  washed  and  fluorescence-labeled  secondary 
antibody  (goat  anti-mouse  immunoglobulin  G  conjugated  to 
AlexaFluor  594,  Molecular  Probes,  USA)  diluted  1:400  was 
added  for  1  h.  The  parasites  were  then  given  a  final  wash  in 
PBS,  the  slides  mounted  in  Vectashield  with  nucleic  acid  stain 
DAPI  (Vector  Laboratories,  USA),  and  stored  in  the  dark  at 
4  °C  until  visualized  using  fluorescence  microscopy.  Use  of 
the  anti-Pfsl6  antibody  allowed  a  comparison  between  the 
number  of  sexual  parasites  (Pfsl6  fluorescent)  and  overall 
infected  erythrocytes  (DAPI  fluorescent). 

2.2.  Microarray  gene  expression  profiling 

Total  RNA  was  isolated  on  each  day  for  13  days  starting 
on  Day  1  for  the  NF54  parasites  and  on  Days  1-3,  6,  8,  and 
12  for  the  3D7  parasites  as  previously  described  [15].  For  the 


purified  3D7  early-stage  gametocyte  samples,  RNA  was  iso¬ 
lated  on  Day  1,  immediately  after  CS  column  separation  on 
Day  2,  and  for  the  following  2  days  (Days  3  and  4).  Label¬ 
ing  of  the  RNA  and  hybridization  to  a  custom-designed  P. 
falciparum  full-genome  high-density  oligonucleotide  array 
(Affymetrix,  USA)  containing  25  mer  probes  to  5159  P.  fal¬ 
ciparum  genes  was  conducted  as  previously  described  [9]. 
The  raw  data  files  from  this  match-only  microarray  were 
analyzed  using  the  Match-Only  Integral  Distribution  algo¬ 
rithm  (MOID)  [16].  The  background  noise  distributions  for 
the  MOID  algorithm  were  calculated  with  the  probe  inten¬ 
sities  of  100  negative  control  genes.  Background  was  then 
subtracted  from  each  raw  probe  intensity  probabilistically 
according  to  the  above  distribution.  During  normalization, 
only  probe  sets  having  more  than  10  probes  and  having 
a  signal  1. 5-fold  higher  than  the  70  percentile  of  back¬ 
ground  distribution  were  taken  into  account.  Normalization 
factors  were  determined  by  scaling  the  average  intensity  of 
genes  in  the  range  between  30  and  90  percentile  to  200.  All 
background  levels  and  normalization  factors  are  available  at 
http://carrier.gnf.org/publications/Gametocyte. 

2.3.  Semi-quantitative  RT-PCR 

Three  micrograms  of  total  RNA,  isolated  from  mixed 
asexual  NF54  parasites  and  Day  7  NF54  gametocytes,  was 
used  for  cDNA  synthesis  using  superscript  reverse  transcrip¬ 
tase  and  oligo(dT)i2_i8  (Invitrogen,  USA).  PCR  using  Extaq 
polymerase  (Takara,  Japan)  was  conducted  on  1  pJ  of  each 
cDNA  template  for  20  cycles  using  an  annealing  temperature 
of  56  °C  with  primers  specific  to  three  hypothetical  genes 
(PF1 1_0214,  PF14_0067,  and  MAL8P1.16)  identified  from 
the  microarray  data  as  expressed  only  during  sexual  devel¬ 
opment  (PF11_0214:  forward  primer  5'-GTA  ACG  AAC 
AGG  AGA  TAA  ATG  GGA  TTT  GTA-3',  reverse  primer 
5'-GGG  TAT  TAC  CAT  AAC  TTT  TCA  ATT  TGT  CTC 
G-3';  PF14  0067:  forward  primer  5'-GCA  TAT  TGT  AAA 
GAA  TGG  TGT  AAA  GCC  ACG-3',  reverse  primer  5'- 
CCA  GCA  CTA  AAT  CCT  TCA  TGT  AAT  ACT  TTA  G-3'; 
MAL8P1.16:  forward  primer  5'-GGT  GTT  GTA  TCA  TCT 
GAA  AAT  ATT  AAG  CTC  C-3',  reverse  primer  5'-CTT 
AGC  CTT  TCT  TGT  TAA  ATT  CTT  CCA  AGC-3'j  and 
the  known  asexual-stage-specific  gene  eba-175  (PF07_0128) 
as  an  asexual  stage  positive  control  (PF07_0128:  forward 
primer  5'-TCA  TAG  TCA  TCA  TGG  AAA  CAG  ACA  AGA 
TCG-3',  reverse  primer  5'-GTA  AAA  TAG  CTC  ATA  CAG 
TAA  TCT  GAT  ACT  GC-3')-  Whenever  possible,  primers 
were  designed  such  that  they  spanned  an  intron  to  allow  for 
the  detection  of  any  possible  contaminating  genomic  DNA 
template.  For  the  genomic  DNA  positive  control  reactions, 
approximately  100  ng  genomic  DNA  isolated  from  mixed 
asexual  stage  NF54  parasites  was  used  as  template. 

2.4.  Ontology-based  pattern  identification 

For  a  detailed  description  of  OPI,  please  refer  to  Zhou 
et  al.  [11].  In  this  study,  38  microarray  expression  data  sets 
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from  various  stages  of  the  P.  falciparum  life  cycle  were  ana¬ 
lyzed  concurrently.  These  included  the  newly  obtained  data 
on  thirteen  NF54  gametocyte  development  stages,  six  3D7 
gametocyte  development  stages,  and  four  3D7  gametocyte 
development  samples  enriched  for  early-stage  gametocytes, 
in  addition  to  previously  published  data  on  seven  sorbitol 
and  seven  temperature  synchronized  3D7  erythrocytic  stages 
(early  ring,  late  ring,  early  trophozoite,  late  trophozoite, 
early  schizont,  late  schizont,  and  merozoite),  and  one  3D7 
sporozoite  stage  [9].  From  here,  OPI  analysis  proceeded  con¬ 
ceptually  as  follows:  (i)  3059  differentially  expressed  genes 
were  identified  using  the  criteria  of  Panova  less  than  0.2 
and  fold  change  in  expression  greater  than  1.5  across  the 
life  cycle  stages  sampled;  (ii)  gene  annotations  for  function¬ 
ally  classified  genes  were  obtained  from  the  GO  Consortium 
[12]  as  of  October,  2004,  with  the  sole  exception  being  the 
sexual  development  annotation  (GO:GNF0004),  which  was 
assigned  manually  to  15  genes  based  on  a  review  of  the  cur¬ 
rent  literature;  (iii)  for  each  classification  category,  all  genes 
sharing  the  same  annotation  were  identified;  (iv)  the  gene 
with  the  expression  pattern  that  best  correlated  with  all  other 
genes  of  the  same  classification  based  on  Pearson’s  corre¬ 
lation  coefficients  was  identified  and  used  as  the  reference 
gene;  (v)  all  other  differentially  expressed  genes  were  then 
ranked  based  on  the  correlation  coefficients  between  their 
life-cycle-dependent  expression  profiles  and  that  of  the  ref¬ 
erence  gene;  (vi)  the  gene  that  had  the  highest  correlation 
with  the  reference  gene  and  also  shared  the  same  classifica¬ 
tion  was  identified  and  a  cluster  was  created  that  included 
both  of  these  genes  as  well  as  any  other  genes  that  did  not 
share  the  same  annotation,  but  fell  in  between  these  two 
annotated  genes  in  the  sorted  gene  list;  (vii)  a  p- value  was 
assigned  to  this  cluster  based  on  the  hypergeometric  distri¬ 
bution  representing  the  probability  that  these  two  genes  in  a 
cluster  of  size  N  genes  would  share  the  same  classification 
by  chance;  (viii)  this  cluster  was  then  expanded  to  include 
all  the  genes  in  the  sorted  gene  list  up  to  the  next  one  pos¬ 
sessing  the  same  classification  as  the  reference  gene;  (ix)  a 
p- value  was  assigned  to  this  new  cluster  just  as  before  using 
the  hypergeometric  distribution;  (x)  this  process  of  cluster 
expansion  and  p- value  assignment  (steps  viii  and  ix)  was 
repeated  until  all  genes  sharing  the  annotation  were  grouped 
together  yielding  the  largest  possible  cluster  for  this  classifi¬ 
cation;  (xi)  the  cluster  with  the  lowest  p- value  was  identified, 
containing  the  highest  concentration  of  genes  belonging  to 
a  particular  classification  in  the  smallest  cluster  size.  The 
entire  process  was  also  repeated  on  log-transformed  expres¬ 
sion  data,  for  several  filtering  Panova  cut-offs,  and  using 
the  average  expression  pattern  of  all  genes  in  each  classifi¬ 
cation  category  instead  of  the  single  reference  gene  in  the 
ranking  step.  The  process  that  yielded  the  cluster  with  the 
minimal  hypergeometric  al  p- value  for  each  classification  cat¬ 
egory  was  chosen  as  the  best  method  to  describe  the  category. 
In  order  to  identify  those  clusters  that  were  statistically  sig¬ 
nificant,  the  entire  process  was  then  repeated  100  times  on 
100  permutated  datasets  where  the  associations  between  gene 


annotations  and  the  gene  expression  profiles  were  random¬ 
ized.  Statistically  significant  clusters  were  defined  as  those 
having  p- values  less  than  90%  of  the  corresponding  values 
in  the  100  permutation  runs. 

2.5.  GBSSR 

For  a  detailed  description  of  GBSSR,  please  refer  to  Kreps 
etal.  [17].  In  this  study,  1000  bootstraps  were  used  to  establish 
the  frequency  distribution  of  all  eligible  6-12  base  pair  strings 
in  the  1000  base  pair  upstream  regions  of  all  genes  in  the 
genome.  Comparison  of  this  genome-wide  string  frequency 
distribution  to  the  frequency  of  strings  in  the  1000  bases 
upstream  regions  of  the  246  genes  in  the  OPI  sexual  develop¬ 
ment  cluster  identified  those  overrepresented  sequences  in  the 
sexual  development  gene  promoter  regions.  These  sequences 
were  considered  putative  cfs-regulatory  elements  for  parasite 
sexual  development  transcriptional  control  mechanisms. 

3.  Results 

3.1.  Gametocyte  cultivation 

Two  14-day  time  courses  of  in  vitro  gametocyte  develop¬ 
ment  were  obtained,  one  using  isolate  NF54  and  one  using 
clone  3D7.  Progression  and  purity  of  gametocyte  develop¬ 
ment  from  stage  I  to  stage  V  was  monitored  by  Giemsa- 
stained  blood  smears  (Fig.  1  and  Fig.  SI).  Although  overall 
gametocyte  development  was  the  same  for  both  time-courses, 
a  slight  difference  in  the  rate  of  development  was  observed 
between  NF54  and  3D7  with  NF54  stage  III  and  stage  IV 
parasites  peaking  on  Days  4  and  6,  respectively,  approxi¬ 
mately  2  days  before  3D7,  which  peaked  for  stage  III  and 
stage  IV  parasites  on  Days  6  and  8,  respectively.  Despite 
taking  measures  to  standardize  gametocyte  production  for 
both  time  courses,  this  discrepancy  could  be  due  to  biolog¬ 
ical  differences  inherent  to  NF54  and  3D7  or  differences  in 


— O— Ring  — A—  Troph/Schiz  — Stagel  —A— Stage  II  Stage  III  — Stage  IV  X  Stage  V 

Fig.  1.  Percentages  of  stages  present  at  each  day  in  the  NF54  gameto¬ 
cyte  development  time  course  were  monitored  using  Giemsa-stained  blood 
smears.  RNA  was  isolated  on  each  day  from  Day  1  to  Day  13  for  hybridiza¬ 
tion  to  the  microarray. 
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Day 

Giemsa  Gametocyte  Count 

Pfs16  IF  Gametocyte  Count 

1 

19.0% 

16.7% 

2 

81  1% 

77.1% 

3 

89.4% 

88.5% 

4 

93.3% 

89.9% 

Fig.  2.  (A)  Percentages  of  stages  present  in  the  CS  magnetic  affinity  col¬ 
umn  purified  3D7  time  course  were  monitored  using  Giemsa-stained  blood 
smears.  Column  separation  was  conducted  to  remove  contaminating  asex¬ 
ual  stages  immediately  after  Day  1  sample  collection.  RNA  was  isolated  on 
each  day  for  hybridization  to  the  microarray.  (B)  Giemsa-stained  blood  smear 
counts  of  early-stage  gametocytes  were  confirmed  to  within  5%  for  each  day 
using  immunofluorescence.  Antibody  against  the  gametocyte-specific  pro¬ 
tein  Pfsl6  allowed  a  comparison  between  the  abundance  of  sexual  parasites 
(Pfsl6  fluorescent)  vs.  overall  infected  erythrocytes  (DAPI  fluorescent). 

difficult-to-control  aspects  of  the  parasite  cultivation  process 
such  as  medium  preparation,  culture  conditions  and  human 
donor-specific  issues  regarding  blood  and  serum  that  can  have 
an  impact  on  parasite  growth  rates.  Feeding  of  NF54  stage 
V  gametocytes  to  Anopheles  stephensi  mosquitoes  on  Day 
15  demonstrated  a  40%  (2/5)  infectivity  rate.  In  addition  to 
the  two  14-day  time-courses,  high  purity  stage  I  and  stage  II 
3D7  gametocytes  were  obtained  using  CS  magnetic  affinity 
columns  and  quantified  by  examination  of  Giemsa-stained 
blood  smears  (Fig.  2A).  Examination  of  these  blood  smears 
by  immunofluorescence  using  antibodies  to  Pfsl6,  which  is 
expressed  at  the  onset  of  gametocytogenesis,  confirmed  the 
Giemsa  stain  parasites  counts  to  within  5%  (Fig.  2B). 

3.2.  Microarray  analysis 

RNA  isolated  from  parasites  on  Days  1-13  for  the  2-week 
NF54  time-course.  Days  1-3,  6,  8,  and  12  for  the  2-week 
3D7  time-course,  and  Days  1-4  for  the  early-stage  3D7 
time-course  was  hybridized  to  a  full  genome  high-density 
oligonucleotide  microarray,  and  expression  values  were  cal¬ 
culated  using  MOID  [16].  Differences  in  the  dynamic  range 
of  expression  values  in  the  NF54  versus  the  3D7  hybridiza¬ 
tions  were  observed  with  the  NF54  having  a  narrower,  but 
more  sensitive  range  than  the  3D7  hybridizations,  resulting 
in  more  genes  being  considered  present  in  the  NF54  time 


course  than  in  the  3D7  time  course.  Although  care  was  taken 
to  ensure  that  the  same  protocols  were  followed  for  each  set 
of  hybridizations,  this  discrepancy  likely  arose  because  dif¬ 
ferent  lab  personnel  conducted  hybridizations  for  each  time 
course  at  separate  times  and  locations  using  different  microar¬ 
ray  scanners.  As  a  result,  visual  differences  occurred  between 
the  NF54  and  3D7  colormetric  depictions  of  expression  lev¬ 
els  for  some  genes  after  normalization.  However,  Pearson’s 
correlation  coefficients  between  all  gene  expression  values 
in  corresponding  days  in  each  time  course  ranged  from  0.75 
(Day  12)  to  0.88  (Day  2),  indicating  that  although  the  absolute 
expression  values  between  the  data  sets  may  have  differed 
slightly,  the  overall  patterns  in  gene  expression  were  quite 
similar  (Table  SI). 

Using  expression  level  and  probe  signal  distribution  cut¬ 
offs  of  E  >  10  and  log  P  <  —0.5,  respectively,  transcripts  were 
found  present  for  an  average  of  3410  genes  per  gameto- 
cyte  microarray  time  point.  These  present  genes  included 
15  well-studied  sexual  development  genes  we  examined 
(Fig.  3).  Further  inspection  of  the  gametocyte  expression 
patterns  for  these  characterized  sexual  stage  genes  revealed 
induction  for  some  during  the  earlier  gametocyte  stages 
I— III  (pfg27/25,  pfs!6,  alpha  tubulin  ii,  pfs48/45),  while  oth¬ 
ers  exhibited  induction  during  the  later  gametocyte  stages 
IV-V  (pfs25 ,  pfs28,  pf77.  dmcl-like  protein )  (Fig.  3).  In 
contrast,  genes  involved  in  processes  such  as  glycolysis  (eno- 
lase,  hexokinase,  triose-phosphate  isomerase,  etc.),  protein 
biosynthesis  (elongation  factors,  initiation  factors  and  ribo- 
somal  proteins),  and  hemoglobin  catabolism  (plasmepsins) 
demonstrated  down-regulation  in  later  stage  gametocytes 
(Fig.  3).  Other  genes,  such  as  those  involved  in  DNA  repli¬ 
cation  (DNA  replication  licensing  factors),  showed  more 
constitutive  expression  across  all  life  cycle  stages  sam¬ 
pled  (Fig.  3).  The  raw  data  files  for  these  gametocyte 
development  microarray  experiments  can  be  accessed  at 
http  ://c  arrier.  gnf.org/publications/Gametocyte . 5 

To  confirm  the  sexual-stage-specific  expression  observed 
for  several  hypothetical  genes,  we  conducted  semi- 
quantitative  RT-PCR  on  total  RNA  isolated  from  para¬ 
sites  on  Day  7  of  the  NF54  time  course  and  from  a 
mixed  asexual  NF54  population.  In  agreement  with  the 
microarray  data,  transcripts  for  the  hypothetical  genes 
PF1 1_0214,  PF14_0067,  and  MAL8P1.16  were  all  shown  to 
be  present  only  in  gametocytes,  whereas  the  known  asexual- 
stage-specific  protein  erythrocyte  binding  antigen  eba-175 
(PF07J3128)  [19]  was  present  only  in  the  mixed  asexual  stage 
population  (Fig.  4). 

3.3.  OPI  sexual  development  analysis 

To  aid  in  the  identification  of  hypothetical  genes  most 
likely  to  be  involved  in  sexual  development,  we  applied 


5  The  raw  data  files  are  also  available  at  PlasmoDB  (release  4.4) 
(www.plasmodb.org)  [18], 
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PFL2510w,  PfCHTl 
13JXJ11.  Pfg27/25 
310w.  Pfsl6 
PFD1050w,  Alpha -Tubulin  ii 
PFI3_0247.  Pfs48/45 
PFI0_0374.  PFsll-1 
PFB0405w,  Pfs230 
14J1294,  PfMAPKl 
1370\v,  PfNEKl 
PF1 1_0147.  PTMAPK2 
Pfg377 

MAL8P1.76,  dmcl-likc  Protein 

PFI0_Q303,  Pfs25 

PPT0_0302.  ITs28 

MAL6P1.213, 1T77 

Mal/»P1 . 189,  hexoKinase 

PFI4_0341.  glucose-6-phosphate  isomerase 

PF  10755c,  6-Phosphofruetokinasc,  putative 

IT  14_0425. 1’ructose-bisphosphate  aldolase 

IT  I4_0378  triose-phosphate  isomerase 

PIT4_0598.  glyceraIdchydc-3-phosphatc  dehydrogenase 

PF'1 1_0208  phosphoglycerate  mutasc,  putative 

PF  1 1 105\v,  phosphoglj  cerate  kinase 

PF10_0 1 15,  enolase 

MAL6P1. 160,  pyruvate  kinase,  punitive 
PF  I3_014l .  L-lactate  dehydrogenase 
MAL6P1.73.  translation  initiation  factor  IF-2.  putative 
PF13_0304  elongation  factor  1  alpha 
PF14_02Q5,  ribosomal  protein  S25,  putative 
PF14_0076,  Plasmcpsin  1 
PFI4_0077,  Plasmcpsin  II 
PFT4_0075,  Plasmcpsin  IV 

PI  14  0177,  DN  A  replication  licensing  factor  MCM2 
PF  13JJ095.  DN  A  replication  licensing  factor  MC.M4 
PFL058(Kv.  DNA  replication  licensing  factor  MCM5 


PFD7_0023,  DNA  replication  licensing  factor  Me  \M7 
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Fig.  4.  Validation  of  microarray  data  with  semi-quantitative  RT-PCR.  Lanes: 
M:  DNA  marker;  A:  asexual  stage  cDNA  template  (from  NF54  mixed 
asexual  population);  S:  sexual  stage  cDNA  template  (from  NF54  Day  7 
gametocytes);  G:  NF54  genomic  DNA  template  (positive  control).  Products 
in  lanes  S  and  absent  from  lanes  A  for  the  hypothetical  genes  PF  11.0214, 
PF14.0067  and  MAL8P1.16  demonstrate  sexual  stage  specific  expression, 
as  shown  by  microarray  analysis.  Likewise,  product  in  lane  A  and  absent  in 
lane  S  for  eba-175  (PF07.0128)  demonstrates  asexual-specific  expression  as 
previously  reported.  For  MAL8P1.16,  primers  were  designed  such  that  they 
spanned  an  intron  to  allow  for  the  detection  of  any  possible  contaminating 
genomic  DNA  template,  of  which  none  was  observed.  All  product  sizes  were 
as  expected. 

a  clustering  algorithm  recently  developed  in  our  labora¬ 
tory  called  ontology-based  pattern  identibcation  [11].  Like 
other  clustering  approaches  to  understanding  gene  function 
from  microarray  data,  OPI  utilizes  the  principle  of  guilt-by- 
association  (GBA),  which  is  based  on  the  observation  that 
for  many  organisms,  genes  involved  in  similar  biological 
processes  and  functions  tend  to  share  similar  mRNA  expres¬ 
sion  profiles  [20].  Hence,  if  one  can  identify  uncharacterized 
genes  that  possess  expression  patterns  similar  to  genes  for 
which  function  is  known,  one  can  rapidly  make  predictions 
that  these  uncharacterized  genes  may  also  be  involved  in  a 
similar  biological  process.  However,  unlike  other  clustering 
approaches,  OPI  uses  existing  classifications  of  gene  func¬ 
tions  in  the  form  of  gene  annotations  to  guide  the  clustering 
process.  By  doing  this,  OPI  is  able  to  generate  empirically- 
optimized  gene  clusters  whose  annotated  and  unannotated 
gene  members,  because  they  share  highly  correlated  expres¬ 
sion  patterns,  have  a  high  likelihood  of  being  involved  in  the 
same  biological  process  based  on  GBA. 

To  guide  OPI  in  identifying  genes  most  likely  to  be 
only  involved  in  sexual  development,  we  created  a  list  of 
15  known  sexual  development  genes  based  on  a  review  of 
the  P.  falciparum  sexual  development  literature  (Table  1) 
[9,21,22].  The  resulting  analysis  produced  a  statistically- 
significant  cluster  (p-value  3.8  x  10“ 10)  containing  246 
genes  that  exhibited  highly  correlated  expression  patterns 
(Pearson  correlation  0.87)  specific  for  the  gametocyte  life 
cycle  stages  sampled.  This  cluster,  called  “Sexual  Develop¬ 
ment”  (GO:GNF0004),  represents  those  genes  most  likely 
to  be  specifically  involved  in  the  sexual  development  pro¬ 


cess  and  can  be  accessed  as  part  of  the  web-accessible  public 
OPI  database  http://carrier.gnf.org/publications/Gametocyte. 
Notably,  a  large  proportion  of  the  genes  present  in  this  clus¬ 
ter  were  categorized  as  hypothetical  (~75%),  reflecting  the 
paucity  of  current  knowledge  regarding  gene  function  in  sex¬ 
ual  development. 

Of  the  15  sexual  development  genes  used  to  seed  the 
OPI  cluster  generation,  11  were  included  in  the  final  sex¬ 
ual  development  cluster.  The  four  annotated  genes  absent 
from  the  cluster  due  to  a  poor  correlation  in  their  life  cycle 
expression  profiles  with  other  genes  in  the  group  were  the 
kinases  pfmapkl  (PF14_0294)  and  pfnekl  (PFL1370w),  the 
early  gametocyte  gene  pfg27/25  (PF13_0011),  and  the  chiti- 
nase  pfchtl  (PFL2510w)  (Table  1).  In  the  case  of  pfchtl,  the 
poor  correlation  was  because  transcript  levels  for  this  gene 
were  in  some  cases  below  the  level  of  detection  in  the  3D7 
time  course.  However,  fox  pfmapkl ,  pfiiekl  and  pfg27/25,  the 
poor  correlation  was  because  in  addition  to  the  gametocyte 
stages,  these  genes  also  showed  significant  expression  in  the 
asexual  stages,  as  has  been  reported  previously  using  more 
traditional  methods  [9,23-25  ].  However,  since  the  goal  was  to 
identify  with  high  confidence  hypothetical  genes  most  likely 
to  be  involved  only  in  the  process  of  sexual  development,  the 
exclusion  of  a  few  known  gametocyte  genes  also  exhibiting 
asexual  expression  was  to  be  expected,  if  not  desired. 

Inspection  of  genes  included  in  the  sexual  development 
cluster  beyond  the  11  used  to  seed  its  generation  yielded 
several  interesting  observations.  Five  putative  proteases 
were  present  [26]  including  one  plasmepsin-like  aspartic 
protease  (PF14_0623),  two  cysteine  proteases  (PFL2290w, 
PF1 1  0298),  and  two  serine  proteases  (PF14  0067, 
MAL8P1.16).  As  for  kinases,  two  putative  serine/threonine- 
protein  Nekl  kinases  (PFL0080c,  PFE1290w),  three 
putative  protein  kinases  (MAL13P1.84,  PFC0485w, 

PFI1290w),  and  a  putative  serine/threonine  protein  kinase 
2  (MAL7P1.100)  were  included.  Metabolic  genes  present 
included  six  members  of  the  type  II  fatty  acid  pathway, 
lipoate  synthase  (MAL13P  1.220),  pyruvate  dehydrogenase 
a  subunit  (PF11_0256),  (3-ketoacyl-ACP  synthase  III  (FabH) 
(PFB0505c),  P-ketoacyl-ACP  reductase  (FabG)  (PFI1125c), 
enoyl-ACP  reductase  (FabI)  (MAL6P1.275),  and  malonyl- 
CoA:ACP  transacylase  (FabD)  (PF  1 3  0066);  two  members 
of  the  heme  biosynthesis  pathway,  8-aminolevulinic  acid 
synthetase  (PFL2210w)  and  porphobilinogen  deaminase 
(PFL0480w);  and  two  members  of  the  mitochondrial  TCA 
cycle,  pyruvate  dehydrogenase  El  component  a-subunit 
(PF1 1  0256)  and  malate  dehydrogenase  (MAL6P1.242). 
Furthermore,  insights  into  the  possible  functions  of  the 
many  hypothetical  genes  in  this  cluster  were  gained  through 


Fig.  3.  Colormetric  expression  map  for  genes  involved  in  various  biological  processes  (y-axis)  across  all  the  life  cycle  stages  sampled  (x-axis).  Bright  red 
indicates  highest  expression;  bright  green  indicates  lowest  expression  (gene  and  array  normalized).  Asexual  stages  ending  with  S  and  T  indicate  respective 
use  of  sorbitol  and  temperature  for  synchronization  as  previously  described  [9].  All  of  the  15  known  sexual  development  genes  examined  (red)  exhibited  up 
regulation  in  gametocytes.  In  contrast,  examples  of  genes  involved  in  glycolysis  (orange),  protein  biosynthesis  (yellow)  and  hemoglobin  catabolism  (green) 
demonstrated  down-regulation  during  gametocytogenesis,  while  examples  of  genes  involved  in  DNA  replication  (blue)  showed  more  constitutive  expression 
across  all  life  cycle  stages  sampled.  The  expression  map  was  generated  using  the  publicly-available  software  Cluster  [67]. 
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Table  1 

Fifteen  known  sexual  stage  genes  manually  given  the  annotation  “Sexual  Development”  (GO:GNF0004)  for  OPI  analysis 


Gene  ID 

Name 

Biological  role 

References 

PFD1050w 

alpha-tubulin  ii 

Male  gametocyte  specific 

[55] 

PFL2510w* 

pfchtl 

Ookinete  invasion  of  mosquito  midgut 

[56,57] 

MAL8P1.76 

dmcl-like  protein 

Involved  in  meiosis 

[9] 

PF14.0294* 

pfmapkl 

Cell  signaling 

[58] 

PF1 1.0147 

pftnapk2 

Cell  signaling 

[24] 

PF10.0374 

pfn-i 

Host  cell  rupture 

[59,60] 

PF13.0011* 

pfg27/25 

Early  sexual  differentiation 

[25,53] 

PFL2405c 

Pfg377 

Female  gametocyte-specific 

[61] 

PFL1370w* 

pfnekl 

Cell  signaling 

[23] 

PFD0310w 

pfsl6 

Early  sexual  differentiation 

[52] 

PFB0405w 

pfs230 

Immune  evasion/gamete  development 

[62] 

PF  10.0303 

pfs25 

Zygote/ookinete  surface  protein 

[63,64] 

PF10.0302 

pfs28 

Zygote/ookinete  surface  protein 

[64] 

PF13.0247 

pfs48/45 

Gametocyte/gamete  surface  protein 

[65] 

MAL6P1.213 

Pf77 

Female  gametocyte-specific 

[66] 

Genes  marked  with  an  asterisk  were  not  included  in  the  final  246  gene  OPI  sexual  development  cluster. 


review  of  gene  information  provided  by  PlasmoDB  as  of 
December  2004  [18]  (Table  S2). 

3.4.  Gametocyte  transcriptional  regulation 

Relatively  little  is  known  about  how  P.  falciparum  regu¬ 
lates  the  transcription  of  genes  important  for  its  pathogenesis 
and  development,  as  no  specific  transcription  factors  and  very 
few  DNA  regulatory  elements  have  been  identified  in  the 
parasite  to  date  [27,28] .  In  yeast,  analysis  of  the  upstream  pro¬ 
moter  regions  of  co-expressed  groups  of  genes  has  proven  to 
be  a  powerful  method  for  the  identification  of  novel  sequence 
elements  controlling  transcription  [29],  With  the  OPI  sex¬ 
ual  development  cluster  on  hand,  representing  246  genes 
exhibiting  highly  correlated  gametocyte-specific  expression 
patterns,  it  was  possible  to  take  this  type  of  approach  to  search 
for  conserved  civ-regulatory  elements  potentially  involved 


in  sexual  development  transcriptional  control  mechanisms 
using  a  pattern  enumeration  algorithm  called  GBSSR  [17]. 

GBSSR  analysis  of  the  upstream  1000  bases  relative  to 
the  start  codon  for  all  genes  in  the  sexual  development  cluster 
revealed  the  presence  of  a  palindromic  sequence,  TGTAN- 
NTACA,  in  65  of  the  246  genes.  Overall,  this  motif  is  found  in 
301  genes  in  the  genome.  However,  the  probability  that  65  of 
them  would  be  included  in  the  sexual  development  cluster  by 
chance  is  7.6  x  10-26.  This  represents  a  highly  significant 
degree  of  enrichment  for  this  motif  in  promoters  of  genes 
exhibiting  gametocyte-specific  mRNA  expression  patterns. 

Two  of  the  genes  whose  upstream  regions  contained  the 
motif  were  the  ookinete  surface  proteins  pfs25  and  pfs28. 
Previously,  Dechering  et  al.  performed  deletion  mapping  on 
the  pfs25  promoter  and  demonstrated  that  elimination  of 
the  region  containing  the  motif  resulted  in  a  marked  reduc¬ 
tion  of  luciferase  reporter  activity  [30]  (Fig.  5a).  Although 
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Fig.  5.  (A)  Alignment  of  examples  of  sequences  from  genes  in  the  OPI  sexual  development  cluster  containing  the  TGTANNTACA  motif  in  their  promoter 
regions.  Gray  highlight  indicates  the  3'-most  end  of  the  promoter  region  upstream  of  pfs25  (PF10_0303)  deleted  by  Dechering  et  al.  (exo5  construct)  that 
resulted  in  a  reduction  of  luciferase  reporter  activity  to  ~5%  that  of  the  full-length  promoter  construct  [30].  B)  Distribution  of  the  TGTANNTACA  motif  in  the 
1000  base  pair  upstream  regions  relative  to  the  start  codon  of  genes  in  the  OPI  sexual  development  cluster  (black)  vs.  genes  in  the  remainder  of  the  genome 
(gray).  The  motif  frequency  is  highest  between  200  and  600  bases  for  genes  in  the  sexual  development  cluster,  but  relatively  even  across  the  upstream  1000 
bases  for  genes  not  in  the  sexual  development  cluster. 
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the  authors  of  this  study  did  not  identify  the  palindromic 
sequence  cited  here,  their  work  strongly  suggests  that  this 
motif  may  indeed  have  functional  relevance.  Further  evidence 
suggesting  a  biological  function  for  this  motif  comes  from 
an  examination  of  its  positional  distribution  relative  to  the 
start  codons  of  genes  it  is  upstream.  For  genes  in  the  sex¬ 
ual  development  cluster,  the  motif  appeared  most  frequently 
approximately  200-600  bases  upstream,  whereas  for  genes 
not  included  in  this  cluster,  its  position  was  essentially  evenly 
distributed  across  the  1000  upstream  bases  (Fig.  5b).  This 
suggests  that  the  mere  presence  of  the  motif  ahead  of  a  gene 
is  likely  not  sufficient  for  gametocyte-specific  expression,  but 
that  context-dependent  factors  also  play  an  important  role.  It 
should  also  be  noted  that  some  proteins  encoded  by  genes 
containing  the  motif  in  their  promoters  do  not  appear  until 
the  later  gamete,  zygote  and  ookinete  stages  (i.e.  pfs25  and 
pfs28).  This  observation  is  consistent  with  the  hypothesis  that 
translational  silencing  may  exist  for  some  sexual  develop¬ 
ment  genes  [31,32]. 

Another  sequence  for  which  there  was  enrichment  in  the 
1000  bases  upstream  of  the  start  codon  of  187  of  the  246  sex¬ 
ual  development  cluster  genes  was  TCCTT.  Although  this 
motif  is  found  fairly  common  throughout  the  genome,  the 
probability  of  finding  this  amount  of  enrichment  in  the  sex¬ 
ual  development  cluster  by  chance  is  1.2  x  10~6.  On  average, 
genes  in  the  sexual  development  cluster  contained  1.8  copies 
of  the  motif  with  18%  of  the  second  copies  being  located 
within  50  bases  of  the  first.  It  is  possible  that  this  redundancy 
confers  specificity.  This  sequence  is  also  the  partial  comple¬ 
ment  of  the  core  sequence  of  the  PAF-1  transcription  factor 
binding  site,  AAGGAATA,  that  was  identified  previously  as 
important  for  pfs25  promotor  activity  in  the  sexual  stages 
[30], 

3.5.  Comparison  of  P.  falciparum  and  P.  berghei 
gametocyte  transcriptomes 

Recently,  Hall  et  al.  published  a  comprehensive  survey 
of  the  life  cycles  of  rodent  malaria  model  organisms  that 
included  an  analysis  of  the  transcriptome  of  immature  and 
mature  Plasmodium  berghei  gametocytes  using  a  genome 
survey  sequence  amplicon  DNA  microarray  [31].  In  all,  tran¬ 
scripts  for  977  genes  were  reported  induced  two-fold  or 
more  in  P.  berghei  gametocytes  compared  to  pure  P.  berghei 
asexual  parasites.  Of  these  977  genes,  504  possessed  P.  fal¬ 
ciparum  orthologues,  which  allowed  a  direct  comparison  to 
the  246  genes  exhibiting  sexual-stage-specific  expression  in 
the  OPI  sexual  development  cluster.  This  comparison  found 
64  genes  that  were  common  to  both  lists,  including  the  well- 
characterized  sexual  stage  genes  encoding  alpha-tubulin  ii, 
dmcl-like  protein ,  and  pfs25. 

Aside  from  biological  differences  between  these  two 
related  species  and  the  fact  that  many  genes  did  not  pos¬ 
sess  orthologues  for  inter-species  comparison,  one  likely 
reason  for  the  apparently  low  overlap  between  these  two 
gene  lists  is  differences  in  the  criteria  set  by  each  group  to 


define  gametocyte-specific  genes.  By  using  a  two-fold  change 
threshold,  Hall  and  coworkers  likely  included  many  genes  in 
their  list  of  977  genes  that,  although  present  in  gametocytes, 
were  not  necessarily  specific  to  this  stage  of  development. 
This  was  evidenced  by  the  presence  of  many  DNA  replica¬ 
tion  genes  in  the  P.  berghei  gametocyte  transcriptome  list 
that  are  also  present  in  P.  berghei  asexual  stages.  OPI  analy¬ 
sis  of  the  P.  falciparum  gametocyte  transcriptome  used  in  this 
study  sets  much  more  stringent  criteria  for  defining  genes  as 
gametocyte-specific.  Because  we  primarily  used  genes  with 
sexual-stage-specific  expression  profiles  to  guide  OPI  sexual 
development  cluster  generation,  genes  expressed  at  signifi¬ 
cant  levels  in  the  asexual  stages  as  well  as  in  gametocytes 
were  excluded  from  the  OPI  sexual  development  cluster,  as 
demonstrated  by  the  absence  of  DNA  replication  genes  such 
as  DNA  licensing  factors  and  DNA  polymerases.  However, 
despite  the  differences  between  the  studies,  each  provides 
important  insights  into  the  biology  of  these  related  malarial 
parasites.  Moreover,  the  list  of  64  genes  common  to  each 
gametocyte  transcriptome  represents  those  genes  that  we  can 
say  with  higher  confidence  play  important  roles  in  sexual 
development. 

3.6.  GO  annotation  OPI  analysis 

In  addition  to  using  OPI  to  systematically  identify  genes 
most  likely  to  be  involved  in  sexual  development  using 
manually  assigned  annotations,  OPI  was  extended  to  predict 
genes  involved  in  a  vast  array  of  other  parasite  processes 
using  GO  annotations  as  a  guide.  The  GO  consortium 
classifies  genes  with  respect  to  biological  processes,  cellular 
components,  and  molecular  functions  using  a  controlled 
vocabulary  [12].  The  October  2004  annotations  for  the  3059 
differentially  regulated  P.  falciparum  genes  identified  in 
this  study  were  obtained  from  the  GO  consortium  website 
(http://www.geneontology.org).  OPI  analysis  using  these 
annotations  as  a  guide  resulted  in  the  generation  of  380  statis¬ 
tically  significant  gene  clusters  representing  202  biological 
processes,  96  molecular  functions,  and  82  cellular  compo¬ 
nents.  The  /> values  for  these  clusters  ranged  from  1.7  x  10-4 
to  1.0  x  10  ~  ,  which  is  several  orders  of  magnitude  lower 
than  those  previously  reported  using  a  k-means  clustering 
approach  [9].  Some  of  the  most  statistically  significant  and 
biologically  interesting  of  these  clusters  are  listed  in  Table  2. 
The  complete  results  of  this  analysis  can  be  accessed  along¬ 
side  the  sexual  development  cluster  at  the  web-accessible 
OPI  database  http://carrier.gnf.org/publications/Gametocyte. 

4.  Discussion 

Although  there  are  well -characterized  differences  in  drug 
sensitivity  between  asexual  and  sexual  stage  parasites  [33], 
the  specific  metabolic  characteristics  unique  to  each  stage 
remain  poorly  understood,  especially  with  regard  to  the  mito¬ 
chondrion  and  apicoplast.  The  mitochondria  of  asexual  and 
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Table  2 

Eighteen  examples  of  statistically  significant  gene  clusters  generated  using  OPI  with  GO  annotations  as  a  guide 


GO# 

Type 

Description 

Total 

p-v alue 

TPR 

Corr 

00:0020033 

BP 

Antigenic  variation 

288 

l.OE-75 

82.6%  (86/104) 

0.72 

00:0007154 

BP 

Cell  communication 

290 

3.2E-44 

50.0%  (84/168) 

0.73 

00:0007155 

BP 

Cell  adhesion 

50 

3.3E-23 

60.0%  (17/28) 

0.83 

00:0009266 

BP 

Response  to  temperature 

70 

4.0E-6 

42.9%  (6/14) 

0.71 

00:0030260 

BP 

Cell  invasion 

87 

6.1E-16 

100%  (10/10) 

0.69 

00:0016052 

BP 

Carbohydrate  catabolism 

82 

3. IE— 10 

50.0%  (11/22) 

0.79 

00:0020020 

CC 

Food  vacuole  (sensu  apicomplexa) 

213 

1.5E-7 

100%  (7/7) 

0.73 

00:0020030 

CC 

Infected  host  cell  surface  knob 

46 

1.3E-24 

76.2%  (16/21) 

0.83 

00:0000502 

CC 

Proteasome  complex  (sensu  eukarya) 

75 

3.5E-21 

76.2%  (16/21) 

0.80 

00:0020008 

CC 

Rhoptry 

4 

3.3E-12 

100%  (4/4) 

0.97 

G0:0020036 

CC 

Maurer’s  cleft 

246 

1.4E-08 

77.0%  (10/13) 

0.79 

00:0020007 

CC 

Apical  complex 

4 

5.0E-11 

66.7%  (4/6) 

0.95 

00:0005198 

MF 

Structural  molecule  activity 

446 

1.0E-11 

52.0%  (39/75) 

0.60 

00:0008092 

MF 

Cytoskeletal  protein  binding 

277 

6.2E-6 

57.1%  (8/14) 

0.80 

00:0003774 

MF 

Motor  activity 

302 

7.5E-8 

66.7%  (10/15) 

0.82 

G0:0008233 

MF 

Peptidase  activity 

93 

3.2E-12 

32.1%  (18/56) 

0.72 

00:0004672 

MF 

Protein  kinase  activity 

242 

2.6E-5 

26.0%  (13/50) 

0.81 

00:0004536 

MF 

Deoxyribonuclease  activity 

32 

9.0E-9 

100%  (4/4) 

0.89 

BP:  biological  process,  CC:  cellular  component,  MF:  molecular  function.  The  column  “Total”  represents  the  total  number  of  genes  in  each  cluster.  The  p-values 
for  each  cluster  were  assigned  using  the  hypergeometric  distribution  representing  the  probability  that  the  number  of  genes  possessing  the  GO  annotation  used 
to  define  the  cluster  would  fall  into  the  cluster  of  size  “Total.”  The  column  true  positive  rate  (TPR)  represents  the  number  of  genes  in  the  cluster  that  possess  the 
GO  annotation  used  to  define  the  cluster  divided  by  the  total  number  of  genes  in  the  data  set  that  possess  the  GO  annotation,  multiplied  by  100  for  a  percentage. 
The  correlation  for  a  cluster  is  defined  as  Pearson’s  correlation  coefficient  between  the  life  cycle  expression  patterns  of  the  two  least  correlated  genes. 


sexual  blood  stage  parasites  are  remarkably  distinct  mor¬ 
phologically,  with  ultrastructural  studies  showing  that  P. 
falciparum  gametocytes  contain  multiple  (4-8)  mitochon¬ 
dria  with  a  relatively  high  density  of  cristae,  whereas  the 
single  mitochondrion  of  asexual  parasites  have  relatively 
few  cristae  [34,35].  These  sexual  stage  mitochondria  are 
thought  to  be  functionally  active,  since  anti-malarials  that 
appear  to  target  the  parasite  electron  transport  chain  show 
some  efficacy  against  gametocytes  [34].  Moreover,  while 
the  asexual  stages  of  the  parasite  life  cycle  in  the  human 
host  rely  heavily  on  glucose  for  the  production  of  energy 
by  fermentation,  it  has  been  proposed  that  a  switch  to  aer¬ 
obic  mitochondria-driven  energy  production  occurs  during 
the  more  oxygen-rich  mosquito  sexual  stages,  coinciding 
with  the  appearance  of  more  cristate  mitochondria  during 
these  stages  [35-37].  Our  data  support  this  hypothesis,  as  we 
show  that  while  transcripts  for  glycolytic  enzymes  are  down 
regulated  in  gametocytes,  transcripts  for  15  of  the  16  mito¬ 
chondrial  TCA  cycle  enzymes  are  present  in  gametocytes, 
with  transcripts  for  pyruvate  dehydrogenase  and  malate  dehy¬ 
drogenase  in  particular  showing  highly  gametocyte-specific 
expression  patterns  as  indicated  by  their  inclusion  in  the  OPI 
sexual  development  cluster.  Furthermore,  evidence  for  the 
importance  of  organelle  function  during  sexual  development 
is  provided  by  our  data,  as  we  showed  significant  up  regula¬ 
tion  of  transcripts  during  gametocytogenesis  for  three  of  the 
six  genes  currently  identified  as  part  of  the  heme  biosynthe¬ 
sis  pathway  in  P.  falciparum  for  which  our  microarray  had 
probes.  These  included  the  mitochrondion-localized  enzyme, 
8-aminolevulinate  synthetase  (1st  step)  [38,39],  as  well  as 
the  enzymes  porphobilinogen  deaminase  (3rd  step)  and  uro¬ 
porphyrinogen  decarboxylase  (5th  step),  both  thought  to  be 


localized  to  the  apicoplast  [39].  De  novo  heme  biosynthe¬ 
sis  may  be  important  during  the  sexual  stages  as  the  parasite 
leaves  the  hemoglobin-rich  blood  environment  of  the  human 
host.  Our  data  also  supports  this  notion  by  showing  a  simul¬ 
taneous  reduction  in  hemoglobin  catabolism  gene  expression 
during  sexual  development  as  compared  to  the  asexual  stages. 

Despite  the  apicoplast  of  sexual  stage  parasites  receiving 
relatively  scarce  attention  in  the  literature,  there  is  some 
evidence  that  the  plastid-like  organelle  plays  a  role  in  sexual 
development.  Thiostrepton  is  an  antibiotic  that  targets  the 
large  subunit  ribosomal  RNA  encoded  by  the  apicoplast 
genome  [40].  Administration  of  this  drug  to  mice  infected 
with  P.  berghei  leads  to  reduction  of  parasite  transmission 
to  mosquitoes,  suggesting  that  the  apicoplast  is  involved  in 
sexual  development  [41].  However,  the  exact  processes  and 
enzymes  that  may  be  important  for  this  putative  sexual  role 
of  the  apicoplast  remain  largely  unexplored.  Inspection  of  the 
genes  involved  in  apicoplast-related  processes  revealed  that 
five  of  the  six  genes  belonging  to  the  apicoplast-localized 
portion  of  type  II  fatty  acid  biosynthesis  pathway  were 
significantly  up  regulated  during  the  gametocyte  stages. 
Moreover,  gametocyte-specific  expression  was  also  observed 
for  the  genes  encoding  the  enzymes  lipoate  synthase  and 
pyruvate  dehydrogenase  a  subunit,  both  thought  to  be 
cytosolically  localized.  These  enzymes  are  responsible  for 
generating  the  fatty-acid  precursor  molecules  lipoate  and 
acetyl-CoA.  It  is  unclear  why  fatty  acid  synthesis  would  be 
important  to  sexual  stage  parasites,  although  one  possible 
explanation  is  the  likely  requirement  for  a  substantial 
increase  in  synthesis  of  membrane  precursors  for  growth  of 
the  gametocyte  and  the  process  of  exflagellation.  Regardless, 
because  many  of  these  genes  have  a  prokaryotic  origin,  they 
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provide  attractive  targets  for  the  development  of  potentially 
low  toxicity  drugs  with  transmission-blocking  properties. 

The  proteases  of  Apicomplexan  parasites  have  long  been 
considered  attractive  drug  targets  due  to  the  many  impor¬ 
tant  biological  roles  they  play  and  the  technical  feasibility 
of  designing  specific  chemical  inhibitors  against  them  [42]. 
Although  protease  activities  have  been  implicated  in  schizont 
rupture  during  the  intraerythrocytic  cycle  of  Plasmodium  par¬ 
asites  [43,44],  it  is  not  known  if  similar  activities  are  involved 
in  the  sexual  stages.  The  five  putative  proteases  [26]  identi¬ 
fied  as  having  gametocyte-specific  expression  from  the  OPI 
sexual  development  cluster  make  attractive  targets  for  the 
development  of  transmission-blocking  drugs.  One  of  these 
genes,  MAL8P1 . 16,  encodes  a  putative  member  of  the  rhom¬ 
boid  protease  family.  Rhomboids  are  intramembrane  serine 
proteases  that  have  substrate  cleavage  sites  within  their  trans¬ 
membrane  domains.  In  Drosophila  melanogaster,  they  have 
been  shown  to  cleave  proteins  containing  epidermal  growth 
factor  (EGF)  domains,  allowing  the  EGF  ligands  to  bind  to 
EGF  receptors  [45].  Recent  evidence  suggests  that  these  pro¬ 
teases  also  play  a  role  in  activating  signals  for  intercellular 
communication  in  the  pathogenic  Gram-negative  bacterium 
Providencia  stuartii  [46,47].  Potential  cleavage  of  surface 
proteins  by  this  putative  rhomboid  protease  during  sexual 
development  in  P.  falciparum  has  yet  to  be  investigated, 
despite  evidence  that  rhomboid  proteases  are  expressed  in 
the  oocyst  stage  of  the  related  Apicomplexan  parasite  in  Tox¬ 
oplasma  gondii  [48]. 

The  intracellular  signaling  pathways  responsible  for  the 
triggering  of  parasite  sexual  development  in  the  vertebrate 
host  remain  largely  uncharacterized.  Despite  several  envi¬ 
ronmental  factors  being  shown  to  induce  gametocytogenesis 
both  in  vitro  and  in  vivo  [49],  chromosomal  deletions  being 
associated  with  a  loss  of  the  ability  to  produce  gametocytes 
[50,51],  and  the  identification  of  early  gametocyte  develop¬ 
ment  markers  such  as  Pfsl6  [52]  and  Pfg27/25  [25,53],  the 
details  regarding  the  specific  signaling  mechanisms  respon¬ 
sible  for  the  switch  from  a  asexual  to  sexual  mode  of  repli¬ 
cation  remain  elusive.  Recently,  Gardiner  et  al.  identified  a 
gene,  pfgig  (PFI1720w),  that  when  genetically  manipulated 
through  silencing  or  over-expression,  resulted  in  a  respective 
reduction  or  induction  of  gametocyte  production  [54].  As  the 
authors  of  this  study  noted,  our  previous  life  cycle  microarray 
data  showed  no  expression  for  this  gene  in  late-stage  game¬ 
tocytes  [9].  However,  the  early-stage  gametocyte  microarray 
data  presented  herein  shows  significant  expression  for  this 
gene,  thus  supporting  the  notion  that  it  is  involved  in  early  sex¬ 
ual  development  events.  Furthermore,  the  six  kinases  demon¬ 
strated  here  to  have  gametocyte-specific  expression  patterns, 
as  well  as  the  sequence  motifs  enriched  in  upstream  regions 
of  genes  in  the  sexual  development  cluster,  both  represent 
promising  leads  for  further  study  toward  elucidating  the  com¬ 
ponents  of  sexual  development  signaling  pathways. 

Perhaps  the  most  interesting  and  exciting  aspect  of  the  sex¬ 
ual  development  cluster  identified  in  this  study  is  the  large 
abundance  of  hypothetical  genes  it  possesses,  and  it  is  in 


this  regard  that  analyses  such  as  the  one  presented  here  hold 
the  most  promise.  Understanding  the  biological  role  hypo¬ 
thetical  genes  play  is  of  particular  importance  with  regard  to 
pathogenic  organisms  such  as  P.  falciparum,  where  over  60% 
of  the  predicted  ORFs  possess  no  putative  function  and  there¬ 
fore  are  ignored  by  rational  drug  and  vaccine  design  efforts. 
Sequence-based  high-throughput  genomic  and  proteomic 
research  can  act  as  powerful  complementary  approaches 
where  more  traditional  methods  of  understanding  gene  func¬ 
tion  have  been  slow  to  yield  results.  However,  it  remains 
a  daunting  challenge  to  analyze  the  large  amounts  of  data 
obtained  by  these  genomics-based  studies  in  order  to  gener¬ 
ate  insightful  hypotheses  that  give  rise  to  biologically  relevant 
conclusions.  Of  the  246  genes  in  the  OPI  sexual  develop¬ 
ment  cluster  described  herein,  197  were  part  of  a  ^-cluster 
with  partially  gametocyte -like  expression  patterns  in  our  pre¬ 
vious  life  cycle  microarray  work  using  a  single  late-stage 
gametocyte  time  point  [9].  However,  due  to  limitations  in 
the  k-means  clustering  approach  employed  in  this  past  study, 
these  197  genes  were  scattered  across  six  of  fifteen  clus¬ 
ters  and  intermingled  with  genes  possessing  non-gametocyte 
specific  expression  patterns.  The  sexual  development  cluster 
generated  here  using  a  more  stringent  OPI  approach  repre¬ 
sents  a  consolidation  of  gametocyte-specific  genes  into  one 
highly  correlated  gene  cluster.  This  consolidation  not  only 
allowed  us  to  say  with  higher  confidence  that  any  particular 
gene  contained  within  the  cluster  is  involved  specifically  in 
sexual  development,  but  also  identified  49  genes,  some  of 
which  have  earlier  gametocyte-specific  expression  patterns, 
that  were  previously  overlooked  using  k-means  clustering. 

In  its  current  state,  the  generic  nature  of  OPI  has  both 
strengths  and  weaknesses.  One  shortcoming  of  OPI  is  that 
it  has  a  tendency  to  generate  some  clusters  that  are  statis¬ 
tically  significant  but  may  not  be  biologically  meaningful. 
For  example,  clusters  possessing  more  than  500  genes  can  be 
challenging  to  interpret  biologically  and  will  contain  many 
false  positives.  Additionally,  many  of  the  clusters  defined  by 
molecular  function  annotation  such  as  proteases  and  kinases 
that  operate  in  various  stages  and  locations  of  the  parasite  may 
also  be  irrelevant  if  only  life  cycle  expression  data  is  used  as 
the  basis  for  analysis.  Fikewise,  genes  involved  in  negative 
regulation  of  a  process  are  unlikely  to  share  an  expression 
profile  with  the  genes  that  they  regulate  and  therefore  may  be 
missed  in  the  OPI  clustering  process.  Another  concern  with 
using  OPI  to  study  organisms  such  as  P.  falciparum  whose 
genome  is  poorly  annotated  is  that  many  important  insights 
will  be  missed.  However,  the  primary  goal  of  OPI  is  not  to 
comprehensively  describe  every  detail  of  the  parasite  biology, 
but  rather  to  use  current  information  to  make  studies  on  indi¬ 
vidual  gene  functions  using  traditional  methods  more  focused 
and  efficient.  As  gene  annotations  for  P.  falciparum  improve 
over  time  and  as  the  repertoire  of  malaria  life  cycle  stages 
profiled  using  microarrays  increases  to  include  later  sexual 
stage  and  liver  stage  parasites,  the  specificity  and  quality  of 
gene  clusters  obtained  via  OPI  analysis  will  undoubtedly  be 
further  refined.  Moreover,  the  generic  nature  of  OPI  makes 
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it  amenable  to  the  analysis  of  other  types  of  high-throughput 
genome- wide  data  generated  by  methods  such  as  quantitative 
proteomics  and  large-scale  yeast-two  hybrid  experiments, 
which  can  address  the  functional  states  of  proteins  in  a  more 
direct  manner  than  is  possible  using  solely  transcriptional 
analysis.  Integration  of  data  from  these  varying  approaches 
will  help  lay  the  groundwork  for  a  malaria  systems  biology 
aimed  at  generating  more  robust  hypotheses  that  will  be  crit¬ 
ical  to  advancing  the  understanding  of  this  important  disease 
organism. 
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